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Abstract 

Calibrated strategies can be obtained by performing strategies that 
have no internal regret in some auxiliary game. Such strategies can be 
constructed explicitly with the use of Blackwell's approachability theo- 
rem, in an other auxiliary game. We establish the converse: a strategy 
that approaches a convex B-set can be derived from the construction 
of a calibrated strategy. 

We develop these tools in the framework of a game with partial 
monitoring, where players do not observe the actions of their opponents 
but receive random signals, to define a notion of internal regret and 
construct strategies that have no such regret. 

Key Words: Repeated Games; Partial Monitoring; Regret; Cali- 
bration; Blackwell's approachability 

Introduction 

Calibration, approachability and regret are three notions widely used both 
in game theory and machine learning. There are, at first glance, no obvious 
links between them. Indeed, calibration has been introduced by Dawid [8j 
for repeated games of predictions: at each stage, Nature chooses an outcome 
s in a finite set S and Predictor forecasts it by announcing, stage by stage, 
a probability over S. A strategy is calibrated if the empirical distribution 
of outcomes on the set of stages where Predictor made a specific forecast 
is close to it. Foster and Vohra [9] proved the existence of such strategies. 
Approachability has been introduced by Blackwell [3] in two-person repeated 
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games, where at each stage the payoff is a vector player can approach 

a given set E C M'^, if he can ensure that, after some stage and with a 
great probabihty, the average payoff wih always remain close to E. This is 
possible, see Blackwell |3J, as soon as E satisfies some geometrical condition 
(it is then called a B-sei) and this gives a full characterization for the special 
case of convex sets. No-regret has been introduced by Hannan [12] for two- 
person repeated games with payoffs in M: a player has no external regret 
if his average payoff could not have been asymptotically better by knowing 
in advance the empirical distribution of moves of the other player. The 
existence of such strategies was also proved by Hannan [12] . 

Blackwell [4] (see also Luce and Raifa [18], A. 8. 6 and Mertens, Sorin 
and Zamir [21], Exercice 7 p. 107) was the first to notice that the existence 
of externally consistent strategies (strategies that have no external regret) 
can be proved using his approachability theorem. As shown by Hart and 
Mas-Colell [13], the use of Blackwell's theorem actually gives not only the 
existence of externally consistent strategies but also a construction of strate- 
gies that fulfill a stronger property, called internal consistency: a player has 
asymptotically no internal regret, if for each of his actions, he has no exter- 
nal regret on the set of stages where he played it (as long as this set has a 
positive density) . This more precise definition of regret has been introduced 
by Foster and Vohra [10] (see also Fudenberg and Levine |llj). 

Foster and Vohra [9] (see also Sorin [28j for a shorter proof) constructed 
a calibrated strategy by computing, in an auxiliary game, a strategy with 
no internal regret. These results are recalled in section [T] and we also re- 
fer to Cesa-Bianchi and Lugosi [5] for more complete survey on sequential 
prediction and regret. 

We provide in section 11.51 a kind of converse result by constructing an 
explicit e-approachability strategy for a convex B-set through the use of 
a calibrated strategy, in some auxiliary game. This last statement proves 
that the construction of an approachability strategy of a convex set can be 
deduced from the construction of a calibrated strategy, which is deduced 
from the construction of an internally consistent strategy, itself deduced 
from the construction of an approachability strategy. So calibration, regret 
and approachability are, in some sense, equivalent. 

In section [2l we consider repeated games with partial monitoring, i.e. 
where players do not observe the action of their opponents, but receive ran- 
dom signals. The idea behind the proof that, in the full monitoring case, 
approachability follows from calibration can be extended to this new frame- 
work to construct consistent strategies in the following sense. A player has 
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asymptotically no external regret if his average payoff could not have been 
better by knowing in advance the empirical distribution of signals (see Rusti- 
chini [25] ) . The existence of strategies with no external regret was proved by 
Rustichini [25j while Lugosi, Mannor and Stoltz [19j constructed explicitly 
such strategies. The notion of internal regret was introduced by Lehrer and 
Solan [17] and they proved the existence of consistent strategies. Our main 
result is the construction of such strategies even when the signal depends 
on the action played. We show in section [3] that our algorithm also works 
when the opponent is not restricted to a finite number of actions, discuss 
our assumption on the regularity of the payoff function (see Assumption [1]) 
and extend our framework to more general cases. 

1 Full monitoring case: from approachability to 
calibration 

We recall the main results about calibration of Foster and Vohra [9], ap- 
proachability of Blackwell [3] and regret of Hart and Mas-Colell [13]. We 
will prove some of these results in detail, since they give the main ideas 
about the construction of strategies in the partial monitoring framework, 
given in section [2j 

1.1 Calibration 

We consider a two-person repeated game where, at stage n G N, Nature 
(Player 2) chooses an outcome Sn in a finite set S and Predictor (Player 1) 
forecasts it by choosing /i„ in A (5), the set of probabilities over S. We 
assume furthermore that /i„ belongs to a finite set Ai = {/i(/),/ G L}. The 
prediction at stage n is then the choice of an element In & L, called the type 
of that stage. The choices of In and s„ depend on the past observations 
hn-i = {h, Si, . . . ,ln-i, Sn-i) and may be random. Explicitly, the set of 
finite histories is denoted hy H = IJnGN ^ with (L x S)^ = and a 
behavioral strategy a of Player 1 is a mapping from H to A(L). Given a 
finite history hn £ {L x S)", ct(/i„) is the law of A strategy r of Nature 
is defined similarly as a mapping from H to A(S'). A couple of strategies 
(cT, r) generates a probability, denoted by Pq-.t; over n = {Lx Sf\ the set 
of plays endowed with the cylinder a-field. 

We will use the following notations. For any families a = {am £ M'^jmeN 
and 1 = {Im S L}m&i and any integer n S N, A''„(/) = {I < m < n,lm = 1} 
is the set of stages of type / (before the n-th), a„(Z) = tv^IJ X^mGAf„(z) 
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the average of a on this set and a.n = - Ylm=i '^m is the average of a over 
the n first stages. 

Definition 1.1 (Dawid [8j) A strategy a of Player 1 is calibrated with 
respect to Ai if for every I G L and every strategy r of Player 2: 



In words, a strategy of Player 1 is calibrated with respect to M. if Sn(0) 
the empirical distribution of outcomes when ^{l) was predicted, is asymp- 
totically closer to than to any other /i(/c) (or conversely, that ii{l) is 
the closest possible prediction to Sn{l)), as long as \Nn{l)\/n, the frequency 
of /, does not go to 0. Foster and Vohra [9j proved the existence of such 
strategies with an algorithm based on the Expected Brier Score. 

An alternative (and more general) way of defining calibration is the 
following. Player 1 is not restricted to make prediction in a finite set A4 and, 
at each stage, he can choose any probability in A{S). Consider any finite 
partition V = {P{k),k £ K} of A(S') with a diameter small enough (we 
recall that the diameter of a partition is defined as max^gi^ max^. ^gp^^) \\^~ 
y\\). A strategy is e-calibrated if the empirical distribution of outcomes 
(denoted by s„(A;)) when the prediction is in P{k) is asymptotically e-close 
to P{k) (as long as the frequency oi k £ K does not go to zero). Formally: 

Definition 1.2 A strategy a of Player 1 is e-calibrated if there exists rj > 
such that for every finite partition V = {P{k), k £ K} of A{S) with diameter 
smaller than rj and every strategy r of Player 2: 



where for every set E C M*^ and z G W^, d{z,E) = infegs \\z — e\\2- 

The following Lemma [1.31 states a calibrated strategy with respect to a grid 
(as in Definition II. ip is e-calibrated (as in Definition II. 2p , therefore we will 
only use the first formulation. 

Lemma 1.3 For every e > 0, there exists a finite set M = {fj,{l),l G L} 
such that any calibrated strategy with respect to M. is e-calibrated. 




where A(S') is seen as a subset o/mI"^'. 
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Proof: Let Ai = G L} be a finite e-grid of A(5'): for every 

probability fi £ A(S'), there exists fj,{l) G Ai sucli tliat ||/U — ^(/)|| < e. 
In particular, for every / S L and n S N, there exists I' £ L such that 
||sra(0 ~ ^ ^- Equation ([T]) implies then that 



limsup^^^ -e^) < 0, 



-as. 



n 



Let 2rj be the smallest distance between any two different and fi{l'). In 
any finite partition V = {P{k),k £ K} of A(S') of diameter smaller fj, ^(/) 
belongs to at most one P{k). Hence a is obviously e-calibrated. □ 

Remark 1.4 Lemma \1.3\ implies that one can construct an e-calihrated 
strategy as soon as he can construct a calibrated strategy with respect to 
a finite e-grid of A{S). The size of this grid is in the order of e~^^^ (expo- 
nential in e) and it is not known yet if there exists an efficient algorithm 
(polynomial in e) to compute e- calibration. The results holds with condition 
^ replaced by 

limsup (d{sn{k),P{k)) - e ) < 0, VA; G k,F^^r-as 



n— )-+oo 



n \ 



however Lemma \1.3\ is trivially true with the square terms d'^{sn{k), P{k)) 
and . 



1.2 Approachability 

We will prove in section [L3] that calibration follows from no-regret and that 
no-regret follows from approachability (proofs originally due to, respectively, 
Foster and Vohra [9] and Hart and Mas-Colell [13]). We present here the 
notion of approachability introduced by Blackwell [3] . 

Consider a two-person game repeated in discrete time with vector pay- 
offs, where at stage n G N, Player 1 (resp. Player 2) chooses the action 
in ^ I (resp. in G J)-, where both I and J are finite. The corresponding 
vector payoff is pn = p{i mjn) where p is a mapping from L x J into M"^. As 
usual, a behavioral strategy a (resp. r) of Player 1 (resp. Player 2) is a 
mapping from the set of finite histories H = IJneN ^ '^)" ^(-^) (i"esp. 
A(J)). 

For a closed set E CR'^ and 6>0,we denote by = {z £ W^, d{z, E) < 
5} the (5-neighborhood of E and by IiE{z) = {e £ E, d{z, E) = \\z — e\\} the 
set of closest points to z in E. 
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Definition 1.5 i) A closed set E C is approachable by Player 1 if 
for every e > 0, there exist a strategy a of Player 1 and G N, such 
that for every strategy r of Player 2 and every n > N: 

Eo-r [diPn,E)] < e and P sup d{pn,E) > e] < e. 

\n>N J 

Such a strategy a, independent of e, is called an approachability strat- 
egy of E. 

a) A set E is excludable by Player 2, if there exists 6 > such that the 
complement of E^ is approachable by Player 2. 

In words, a set C M'^ is approachable by Player 1, if he has a strategy 
such that the average payoff converges almost surely to E, uniformly with 
respect to the strategies of Player 2. 

Blackwell [3] noticed that a closed set E that fulfills a purely geometrical 
condition (see Definition is approachable by Player 1. Before stating 
it, let us denote by P^{x) = {p{x, y), y G A( J)}, the set of expected payoffs 
compatible with a; G A(/) and we define similarly P'^{y). 

Definition 1.6 A closed subset E of is a B-set, if for every z G , 
there exist p G He^z) and x {= x{z^p)) G A(/) such that the hyperplane 
through p and perpendicular to z — p separates z from P^{x), or formally: 

yz G 3p G TIe{z), 3x G A(/), {p{x, y)-p,z-p)<0, Vy G A( J). (3) 

Informally, from any point z outside E there is a closest point p and 
a probability x G A(/) such that, no matter the choice of Player 2, the 
expected payoff and z are on different sides of the hyperplane through p 
and perpendicular to z — p. To be precise, this definition (and the following 
theorem) does not require that J is finite: one can assume that Player 2 
chooses an outcome vector U G [— so that the expected payoff is 
p{x,U) = {x,U). 

Theorem 1.7 (Blackwell [3]) If E is a B-set, then E is approachable by 
Player 1. Moreover, the strategy a of Player 1 defined by cr{hn) = x(j)^) is 
such that, for every strategy r of Player 2: 

Ea,r[4(Pn)] < — ^o,r { SUp d{-p^, E) > tA < ^ (4) 

n \n>N J tN 

with B = supjj \\p{i,j)f. 
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In the case of a convex set C, a complete characterization is available: 



Corollary 1.8 (Blackwell [3j) A closed convex set C C M*^ is approach- 
able by Player 1 if and only if: 



In particular, a closed convex set C is either approachable by Player 1, or 
excludable by Player 2. 

Remark 1.9 Corollary I j.<§l implies that there are (at least) two different 
ways to prove that a convex set is approachable. The first one, called direct 
proof, consists in proving that C is a B-set while the second one, called 
undirect proof, consists in proving that C is not excludable by Player 2, which 
reduces to find, for every y G A(J), some x E A(/) such that p(x,y) £ C. 

Consider a two-person repeated game in discrete time where, at stage n G N, 
Player 1 chooses in & I ss above and Player 2 chooses a vector C/„ G [—1,1]^^ 
(with c = |/|). The associated payoff is C/^", the in-th coordinate of Un- The 
internal regret of the stage is the matrix Rn = R{in,Un), where R is the 

2 

mapping from / x [— 1, l]'^ to R'^ defined by: 



Definition 1.10 (Foster and Vohra [10| ) A strategy a of Player 1 is in- 
ternally consistent if for any strategy r of Player 2: 



In words, a strategy is internally consistent if for every i € I (with 
a positive frequency). Player 1 could not have increased his payoff if he 
had known, before the beginning of the game, the empirical distribution of 
Player 2's actions on Nn{i)- Stated differently, when Player 1 played action 
i, it was his best (stationary) strategy. The existence of such strategies have 
been first proved by Foster and Vohra [TO] and Fudenberg and Levine 



p2(y)nc/0, V2/gA(j). 



(5) 




With this definition, the average internal regret Rn is defined by: 




limsupi?„ < 0, Pct,t-«s- 
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Theorem 1.11 There exist internally consistent strategies. 

Hart and Mas-Colell [13] noted that an internahy consistent strategy 
can be obtained by constructing a strategy that approaches the negative 

2 

orthant Q = M'l in the auxihary game where the vector payoff at stage n is 
Rn- Such a strategy, derived from approachability theory, is stronger than 
just internally consistent since the regret converges to the negative orthant 
uniformly with respect to Player 2's strategy (which was not required in 
Definition [riO|). 

The proof of the fact that is a B-set relies on the two followings 
lemmas: Lemma 11.121 gives a geometrical property of Q and Lemma 11.131 
gives a property of the function R. 

1.3 From approachability to internal no-regret 

2 

Lemma 1.12 Let nn(-) be the projection onto $7. Then, for every A : 

{Un{A),A-Un{A)) =0. (6) 

Proof: Note that since n = then A+ = A - UniA) where A^j = 
max(j4jj,0) and similarly A~ = IIq{A). The result is just a rewriting of 
(^-,^+) = 0. □ 
For every (c x c)-matrix A = {aij)ij^j with non- negative coefficients, 
A E A(/) is an invariant probability of A if for every i £ T. 

^KJhji = Xii)^aij. 

The existence of an invariant probability follows from the similar result for 
Markov chains, implied by Perron-Frobenius Theorem (see e.g. Seneta |27]). 

Lemma 1.13 Let A = {aij)ij^j be a non-negative matrix. Then for every 
A, invariant probability of A, and every U G M'^: 

{A,Ex[R{;U)]) = 0. (7) 

Proof: The {i,j)-th coordinate of Ea [R{-, U)] is A(i) {W - U'), there- 
fore: 

{A,Ex [R(.;U)]) = {U' - U') 

and the coefficient of each [/* is 'Yuj&i^ij^i^) ~ "^jei ^ji^U) — 0' because A 
is an invariant measure of A. Therefore {A, Ex [R{-, U)]) = 0. □ 
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Proof of Theorem II. lit Summing equations ([6]) (with A = Rn) and 
dZD (with A = (Rn)^) gives: 

(Ea„ [Ri-, U)] - Un(Rn),Rn - nn(Rn)) = 0, 

for every A„ invariant probabihty of R^ and every U G [—1, 1]''^. 

Define the strategy a of Player 1 by cr{hn) = A„. The expected payoff 
at stage n + 1 (given hn and Un+i = U) is Ea„ [Ri-, U)], so is a B-set and 
is approachable by Player 1. □ 

Remark 1.14 The construction of the strategy is based on approachability 
properties therefore the convergence is uniform with respect to the strategies 
of Player 2. Theorem \1.7\ implies that for every r] > 0, and for every strategy 
T of Player 2: 

3n > iV,3i,j e i, (JJniiy -Uniiy) >7^=0 



and Ect r 



sup^^(c7„(i)^--c7„(fr 

MI n 



O 



Although they are not required by definition \1.10\ those bounds will be useful 
to prove that calibration implies approachability. 

1.4 Prom internal regret to calibration 

The construction of calibrated strategies can be reduced to the construction 
of internally consistent strategies. The proof of Sorin [28j simplifies the one 
originally due to Foster and Vohra [TO] by using the following lemma: 



Lemma 1.15 Let (am)mGN a sequence in and a, (3 two points in M.'^. 
Then for every n E N* ; 

Th 

with II • II2 the Euclidian norm ofW^. 

Proof: Develop the sums in equation ([8]) to get the result. □ 
Now, we can prove the following: 

Theorem 1.16 (Foster and Vohra [10]) Let Ai be a finite grid of A{S). 
There exist calibrated strategies of Player 1 with respect to M. In particular, 
for every e > there exist e-calibrated strategies. 
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Proof: We start with the framework described in section [TTTl Consider the 
auxihary two-person game with vector payoff defined as fohows. At stage 
n S N, Player 1 (resp. Player 2) chooses the action In & L (resp. s„ G S) 
which generates the vector payoff Rn = R{ln,Un) G W^, where R is as in 
[THl with: 

By definition of R and using Lemma 11.151 for every n G N*: 



\Nn{l)\ j Em€N„(l) -^(QUa " \\Sm - Kk)\\2 

n \ \Nnil)\ 



Wnil)\ 



n 



\-sn{l)-m\l-\\-sn{l)-^^{m 



Let a be an internally consistent strategy in this auxiliary game, then 
for every I E L and k € L: 

limsup (\\sn{l) - - \\sn{k) " l^{k)\\l) < 0, P,,,-as. 

Therefore a is calibrated, with respect to A^; if it is an e-grid of A(S'), then 
a is e-calibrated. □ 

Remark 1.17 We have proved that a is such that, for every I E L, Sn{l) is 
closer to fi{l) than to any other fi{k), as soon as \Nn{l)\/n is not too small. 

The facts that s„ belongs to a finite set S and {^{l)} are probabilities over 
S are irrelevant: one can show that for any finite set {a{l) E W^,l E L}, 
Player 1 has a strategy a such that for any bounded sequence {a„i)m&i in 

and for every I and k : 

limsup^^^^f ||an(0 -a(/)f - \\an{l) - a{k)\\A < 0. 

n^oo n \ J 

1.5 Prom calibration to approachability 

The proof of Theorem 11.161 shows that the construction of a calibrated strat- 
egy can be obtained through an approachability strategy of an orthant in 
an auxiliary game. 

Conversely, we will show that the approachability of a convex 5-set can 
be reduced to the existence of a calibrated strategy in an auxiliary game, 
and so give a new proof of Corollary 11.81 (and mainly construct explicit 
strategies) . 
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Alternative proof of Corollary II. 8t The idea of the proof is very 
natural: assume that condition ([5]) is satisfied and rephrased as: 



My G A( J), 3x(= Xy) G /\{I),p{xy, y) G C. 



(9) 



If Player 1 knew in advance yn then he would just have to play accordingly 
to Xy„ at stage n so that the expected payoff Eo-,r[/On] would be in C. Since 
C is convex, the average payoff would also be in C. Obviously Player 1 does 
not know y„ but, using calibration, he can make good predictions about it. 

Since p is multilinear and therefore continuous on A(/) x A(J), for every 
e > 0, there exists J > such that: 



We introduce the auxiliary game F where Player 2 chooses an action (or 
outcome) j G J and Player 1 forecasts it by using {y{l),l G L}, a finite 
(5-grid of A(J). Let o" be a calibrated strategy for Player 1, so that 
the empirical distribution of actions of Player 2 on Nn{l), is asymptotically 
(5-close to y{l). 

Define the strategy of Player 1 in the initial game by performing a and 
if = / by playing accordingly to x{l) := Xy(j_-^ G A(/), as depicted in ([9]). 
Since the choices of actions of the two players are independent, will be 
close to p {x{l),'j^{l)), hence close to p{x{l),y{l)) (because a is calibrated) 
and finally close to C^, as soon as |A'^ri(OI is not too small. 

Indeed, by construction of a, for every rj > there exists A^i G N such 
that, for every strategy r of Player 2: 



P.,. (yi eLyn> Nu (||J„,(/) - yiml - .J^) < r? j > 1 - r?. 



This implies that with probability greater than 1 — ij, for every I G L and 
n > Ni, either ||j„(/) - y(/)|| < 26 or Nn{l)/n < r]/36'^, therefore with 
P^ .^-probability at least 1 — rj: 



WGL,Vn>iVi,^^d(p(x(/),J„(0),C') <e^^ + T?2- (^0) 



Hoeffding-Azuma [21 |T3] inequality for sums of bounded martingale dif- 
ferences implies that for any r/ > 0, n G N, o" and r: 



yy,y' G A(J), \\y - y'\\^ <25 ^ p{xy,y') G C 



Nn{l)\ 



Nn{l)\ 



n 



P, 



(I 
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therefore: 



n 



and summing over n £ {N, . . . , } and / G L gives that with Po-,r-probabihty 
at most ^ exp (^—^-^ 

supsup j^^|pJ/)-p(x(0,J„(/))|| >r/. (11) 
n>N leL I n ) 

So for every r] > 0, there exists A''2 G N such that for every n > N2: 

Since C is a convex set, d{-,C) is convex and with probabihty at least 
1 — 2r/, for every n > max(A''i, 



n 



< 



E 



|w-(OI 



n 
l&L 



d{p{x{i),ui)),c) + - p(x(Oj„(0)I 



<e + L,(^ + l). 

And C is approachable by Player 1. 

On the other hand, if there exists y such that P'^{y) H C = 0, then 
Player 2 can approach P'^{y), by playing at every stage accordingly to y. 
Therefore C is not approachable by Player 1. □ 

Remark 1.18 To deduce that is in from the fact that Pn{l) is in 
for every I G L, it is necessary that C (or d{-,C)) is convex. So this proof 
does not work if C is not convex. 



1.6 Remarks on the algorithm 

a) Blackwell proved Corollary 11.81 using Von Neumann's minmax the- 
orem, the latter allowing to show that a convex set C that fulfills 
condition Q is a 5-set. Indeed, let z be a point outside C. Recall 
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that for every y G A(J) there exists Xy G A(/) such that p{xy,y) G C. 
Since C is convex, if we denote by Hciz) the projection of z onto it, 
then for every c £ C (c — Ilc{z), z — Ilc{z)) < 0, . Therefore, 

Vy G A(J),3x G A(J), (E,.,y[p(i, j)] - Uc{z),z- Hciz)) < 

and if we define g{x,y) = (E^,y[p(i, j)] - Uc{z),z - Udz)) then g is 
hnear in both of its variable so 

min max q(x, v) = max min q(x, y) < 0, 
which imphes that C is a S-set. 

The strategy a defined by o{hn) = Xn where Xn is any minimizer 
of maxy£^(j) G{x, y) is an approachabihty strategy, said to be implicit 
since there are no easy way to construct it. Indeed computing a would 
require to find, stage by stage, an optimal action in a zero-sum game 
or equivalently to solve a Linear Program. There exist polynomial 
algorithms (see Khachiyan |15j ) however their rates of convergence 
are bigger than the one of Gaussian elimination and their constants 
can be too huge for any practical use. Nonetheless, it is possible to 
find e-optimal solution by repeating an polynomial number of time the 
exponential weight algorithm (see Cesa-Bianchi and Lugosi [5] , Section 
7.2 and Mannor and Stoltz [20j). 

For a fixed e > 0, the strategy (that approaches C^) we described 
computes at each stage an invariant measure of a matrix with non- 
negative coefficients. This obviously reduces to solve a system of linear 
equations which is guaranteed to have a solution. And this is solved 
polynomially (in \L\) by, for example and as proposed by Foster and 
Vohra [lOj, a Gaussian elimination. If payoffs are bounded by 1, then 
one can take for {y{l),l G L} any arbitrarily e/2-grid of A(J), so \L\ 
is bounded by (2/e)l"^L Moreover, the strategy aims to approach C"^, 
so it is not compulsory to determine exactly x{l), one can choose them 
in any e/2-grid of A(/). 

In conclusion, Blackwell's implicit algorithm constructs a strategy that 
approaches (exactly) a convex C by solving, stage by stage, a Linear 
Program without any initialization phase. For every e > 0, our explicit 
algorithm constructs a strategy that approaches by solving, stage 
by stage, a system of linear equations with an initialization phase (the 
matchings between x{l) and y{l)) requiring at most steps. 
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b) Blackwell's Theorem states that if for every move y G A( J) of Player 2, 
Player 1 has an action x G such that p{x,y) G C then C is 
approachable by Player 1. In other words, assume that in the one- 
stage (expected) game where Player 2 plays first and Player 1 plays 
second, Player 1 has a strategy such that the payoff is in a convex C. 
Then he also has a strategy such that the average payoff converges to 
C, in the repeated (expected) game where Player 2 plays second and 
Player 1 plays first. 

The use of calibration transforms this implicit statement into an ex- 
plicit one: while performing a calibrated strategy (in an auxiliary game 
where J plays the role of the set of outcomes), Player 1 can enforce 
the property that, for every / G L, the average move of Player 2 is 
almost y{l) on Nn{l). So he just has to play Xy(^i^ on these stage and 
he could not do better. 

c) We stress out the fact that the construction of an approachability strat- 
egy of reduces to the construction of a calibrated strategy in an 
auxiliary game, hence to the construction of an internally-consistent 
strategy in a second auxiliary game, therefore to the construction of 
an approachability strategy of a negative orthant in a third auxiliary 
game. In conclusion, the approachability of an arbitrary convex set re- 
duces to the approachability of an orthant. Along with equations (|10p 
and (fTTI) . this implies that 'Ea,T [d{'PnjC) — e] < O (n^^^'^). However, 
as said before, the constant depends on el'^L 

d) The reduction of the approachability of a convex set C C M*^ in a game 
r to the approachability of an orthant in an auxiliary game F' can also 
be done via the following scheme: for every e > 0, find a finite set of 
half-spaces {H{l),l G L} such that C C ni^LH{l) C C^. For every 
I G L, define c{l) G M'' and b{l) G M such that: 



Obviously, a strategy that approaches the negative orthant in F' will 
approach, in the game F, the set f]H{l) and therefore C^. However, 
such a strategy might not be based on regret and might not be explicit. 




and the auxiliary game F' with payoffs defined by 
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2 Internal regret in the partial monitoring frame- 
work 

Consider a two person game repeated in discrete time. At stage n G N, 
Player 1 (resp. Player 2) chooses in ^ I (resp. jn G J), which generates the 
payoff pn = p{ im jn) where p is a mapping from I x J to M. Player 1 does 
not observe this payoff, he receives a signal Sn € S whose law is s{in,jn) 
where s is a mapping from / x J to A(S'). The three sets /, J and S 
are finite and the two functions p and s are extended to A(/) x A (J) by 
p{x,y) ='Ex,y[p{i,j)] and s{x,y) =E,^,y[s{i,j)] E A(5). 

We define the mapping s from A(J) to A(S')^ by s{y) = {s{i,y))^^j 
and we call such a vector of probability a flag. Player 1 cannot distinguish 
between two different probabilities y and y' in A(J) that induces the same 
flag /i G A(S')^, i.e. such that p = s{y) = s(y'). Thus we say that p = 
s(y), although unobserved, is the relevant or maximal information available 
to Player 1 about the choice of Player 2. We stress out that a flag p is 
not observed since given x € A(/) and y G A(J), Player 1 has just an 
information about p^ which is only one component of p (the i-th one, where 
i is the realization of x). Moreover, this component is the law of a random 
variable whose realization (i.e. the signal s € 5") is the only observation of 
Player 1. 

Example 2.1 (Label efficient prediction) Consider the following game 
(Example 6.4 in Cesa-Bianchi and Lugosi '^). Nature chooses an outcome 
G or B and Player 1 can either observe the actual outcome (action 0) or 
choose to not observe it and to pick a label g or h. If he chooses the right 
label, his payoff is 1 and otherwise 0. Payoffs and laws of signals received 
by Player 1 can be resumed by the following matrices (where a, b and c are 
three different probabilities over a finite set S). 





G 


B 




G 


B 














a 


b 


Payoffs: g 





1 


and Signals: g 


c 


c 


b 


1 





b 


c 


c 



Action G, whose best response is g, generates the flag (a, c, c) and action B, 
whose best response is b, generates the flag {b,c,c). In order to distinguish 
between those two actions, Player 1 needs to know the entire flag and there- 
fore to know s(o, y) although action is never a best response (but is said 
to be purely informative) . 
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As usual, a behavioral strategy a of Player 1 (resp. r of Player 2) is a 
function from the set of finite histories for Player 1, = IJneN (-^ ^ '^)"' 
to A(/) (resp. from H"^ = U„eN x x J)" to A(J)). A couple (cr,T) 
generates a probability Po-,t over Ti = {I x S x J)^. 

2.1 External regret 

Rustichini [25] defined external consistency in the partial monitoring frame- 
work as follows: a strategy a of Player 1 has no external regret if Po-,r-as: 

limsup max min p{x,y) — J)^ < 0. 

\ s(y) = s(j„) 

where s(j„) G A(5)^ is the average flag. In words, the average payoff of 
Player 1 could not have been uniformly better if he had known the average 
distribution of flags before the beginning of the game. 

Given a flag fi G A{S)^ , the function miUj^gg-i^^) />(•, y) may not be 
linear. So the best response of Player 1 might not be a pure action in I, but 
a mixed action x G A(/) and any pure action in the support of x may be 
a bad response. This explains why, in Rustichini's definition, the maximum 
is taken over A (I) and not just over I as in the usual definition of external 
regret. 

Example 2.2 (Matching Penny in the dark) Player 1 chooses either 
Tail or Heads and flips a coin. Simultaneously, Nature chooses on which 
face the coin will land. If Player 1 guessed correctly his payoff equals 1, 
otherwise -1. We assume that Player 1 does not observe the coin. 
Payoffs and signals are resumed in the following matrices: 





T 


H 




T 


H 


Payoffs: T 


1 


-1 


and Signals: T 


c 


c 


H 


-1 


1 


H 


c 


c 



Every choice of Nature generates the same flag (c, c). So miny^^^j-j p{x,y) 
is always non-positive and equals zero only if x = (1/2, 1/2). Therefore the 
only best response of Player 1 is (1/2, 1/2), while both T or H give the worst 
payoff of -1. 
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2.2 Internal regret 

We consider here a generalization of the previous's framework: at stage 
n G N, Player 2 chooses a flag G ^(5')^ while Player 1 chooses an action 
in and receives a signal s„ whose law is the in-th coordinate of /in- Given 
a flag fi and x £ Player 1 evaluates the payoff through an evaluation 

function G from A(/) x A(S')^ to M, which is not necessarily linear. 

Recall that with full monitoring, a strategy has no internal regret if each 
action i G / is the best response to the average empirical observation on 
the set of stages where i was actually played. With partial monitoring, best 
responses are elements of A(/) and not elements of /, so if we want to define 
internal regret in this framework, we have to distinguish the stage not as 
a function of the action actually played (i.e. in S /) but as a function of 
its law (i.e. x„ G A(/)). We assume that the strategy of Player 1 can be 
described by a finite family {x{l) G A(I),/ G L} such that, at stage n G N, 
Player 1 chooses a type In and, given this choice, in is drawn accordingly 
to x{ln)- We assume that L is finite since otherwise Player 1 have trivial 
strategies that guarantee that the frequency of every I converges to zero. 
Note that since the choices of can be random, any behavioral strategy 
can be described in such a way. 

Definition 2.3 (Lehrer-Solan [17]) For every n G N and every I G L, 
the average internal regret of type I at stage n is 

lZn{l)= sup [G(3;,7I„(Z)) - G(^„(/),7i„(0)] • 

a;GA(/) 

A strategy a of Player 1 is (L, e)-internally consistent if for every strategy 
T of Player 2: 

limsup [TZnil] -e) <0, V/ G L, P^,^-as. 

n— >+oo n \ J 

Remark 2.4 Note that this definition, unlike in the full monitoring case, is 
not intrinsic. It depends on the choice (which can be assumed to be made by 
Player 1) of{x{l), I G L}, and is based uniquely on the potential observations 
(i.e. the sequences of flags {fJ-n)neN) of Player 1. 

Remark 2.5 The average flagjln belongs to A(S')^ and is defined byJjLn[s\ = 
L^m=l^',^y J gyg^y s £ S. 

In order to construct (L, e)-internally consistent strategies, some regu- 
larity over G is required: 
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Assumption 1 For every e > 0, there exist G A{S)^ ,x{l) G A(/),/ G 

L} tw;o finite families and r],6 > such that: 

1. A(5/cUeL5(M0,'^); 

2. For every I £ L, if \\x — x{l)\\ < 2rj and ||/^ — /^(Oll ^ 2(5, i/ien x G 

where BR^{^) = |x G A(/) : G{x,fi) > snp^^^^^ G{z,^) — e| is the set of 
e-best response to n £ ^{^Y and B{^, (5) = j^u' G A(S')^, — < 5} . 

In words, Assumption [1] implies that G is regular with respect to /x and 
with respect to x: given e, the set of flags can be covered by a finite number 
of balls centered in / G L}, such that x{l) is an e-best response to any 

/i in this ball. And if x is close enough to x{l)^ then x is also an e-best 
response to any close to Without loss of generality, we can assume 

that x{l) is different from x{l') for any / ^ I'. 

Theorem 2.6 Under AssumptionUl there exist (L,e) -internally consistent 
strategies. 

Some parts of the proof are quite technical, however the insight is very 
simple, so we give firstly the main ideas. Assume for the moment that 
Player 1 fully observes the flag at each stage. If, in the one stage game. 
Player 2 plays first and his choice generates a flag /i G A(S')^, then Player 1 
has an action x G A(/) such that x belongs to BR^{fj,). Using a minmax 
argument (like Blackwell did for the proof of Theorem II. 81 recall Remark II. 6 1 
b) one could prove that Player 1 has an (L, e)-internally consistent strategy 
(as did Lehrer and Solan [TT]). 

The idea is to use calibration to transform this implicit proof into a 
constructive one, as in the alternative proof of Corollarv 11.81 Fix e > and 
consider the game where Player 1 predicts the sequence (/U„)„gN using the 
5-grid {fj,{l),l G L} given by Assumption[TJ A calibrated strategy of Player 1 
chooses a sequences {ln)nm in such a way that 7^n(0 is asymptotically 5-close 
to IJ,{1). Hence Player 1 just has to play accordingly to x{l) G BR^{^{1)) on 
these stages. 

Indeed, since the choices of action are independent, in{l) will be asymp- 
totically r/-close to x{l) and the regularity of G will imply then that *„(/) G 
BRf,(jl^{l)) and so the strategy will be (L, e)-internally consistent. 

The only issue is that in the current framework the signal depends on the 
action of Player 1 since the law of Sn is the in component of which is not 
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observed. Signals (that belong to S) and predictions (that belong to A(5)^) 
are in two different spaces, so the existence of calibrated strategies is not 
straightforward. However, it is well known that, up to a slight perturbation 
of x{l), the information available to Player 1 after a long time is close to 
/!„(/) (as in the multi-armed bandit problem, some calibration and no-regret 
frameworks, see e.g. Cesa-Bianchi and Lugosi [5] chapter 6 for a survey on 
these techniques). 

For every x S A(/), define G A(/), the r/-perturbation of x by = 
{1 — ri)x + rju with u the uniform probability over I and for every n define 
Sn by: 

with x^(/„)[i„] > r/ > the weight put by Xr^iln) on i„. We denote by ?n(0) 
instead of Sn(0) their average on Nn{l). 

Lemma 2.7 For every 6 > 0, there exists G N such that, for every I G L: 
Pa,r (Vm > n, ||?n(/) - 7Z„(0|| < ^1 Nnil) > TV) > 1 - 0. 
Proof: Since for every n S N, the choices of i„ and independent: 

Ea,r[Sn\hn-l,ln,f^n] = ^ ^ (/„) [i] ( 0, . . . , Oj 

i£l s£S ^ Xri\ri)m J 

is/ seS 

= J^(o,...,Mi,...,o) 

where /x„ is seen as an element of M'^^. Therefore Sn(0 is an unbiased 
estimator of /!„(/) and Hoeffding-Azuma's inequality (actually its multi- 
dimensionnal version by Chen and White |7] together with the fact that 
sup„gp^ < Vj^^ < oo) implies that for every 6 > there exists N G N 
such that, for every I £ L: 

K,r (Vm > n, \\snil)--fl^{l)\\ < 9\ |A^„(0| >N)>l-e. 

□ 

Assume now that Player 1 uses a calibrated strategy to predict the se- 
quences of s„ (this is game is in full monitoring) , then he knows that asymp- 
totically ?ra(/) is closer to /u(/) than to any fi{k) (as soon as the frequency 
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of / is big enough), therefore it is (5-close to /u(/). Lemma 12.71 imphes that 
/!„(/) is asymptotically close to 5^(0 therefore 25-close to /i(/). 

Proof of TheoremlM} Let the families {x(/) G A{I),fi{l) € A{Sy,l G 
L} and r],5 > he given by Assumption [1] for a fixed e > 0. 

Let r' be the auxiliary repeated game where, at stage n. Player 1 (resp. 
Player 2) chooses In ^ L (resp. /i„ G A(S')^). Given these choices, in (resp. 
Sn) is drawn accordingly to Xrj{ln) (resp. fJ-l^)- By Lemma \2.7\ for every 
9 > 0, there exists A'^i G N such that for every I £ L: 

K,r (Vm > n, ||?„(0 - 7I„(/)|| < ^1 |iV„(/)| >Ni)>l-e. (12) 

Let cj be a calibrated strategy associated to (s„)neN in T'. For every > 0, 
there exists N2 such that with PCT,T-pi'obability greater than 1 — 9: 



\Nn{l)\ 



n 



yn>N2,yi,keL,^^^^{ \\sn{l) - fim^ - \\sn{l) - Kk)f ] <9. (13) 



Since {^{k),k G L} is a (^-grid of A{S)^ , for every n G N and I G L, there 
exists k £ L such that ||s'n(/) — fi{k)\\ < 6. Therefore, combining equation 
(fT2]) and ([13]), for every 9 > there exists N3 such that: 

Pa,r (^Vn>iV3,V/GL,^^^(^||7I„(0-M0f -'5') <^,) >1-^- (14) 

For every stage of type I £ L, in is drawn accordingly to Xrj{l) and 
by definition ||2;^(/) — x(/)|| < r]. Therefore Hoeffding-Azuma's inequality 
implies that, for every 9 > there exists A'4 G N such that: 

P^,r (yn > N4,yi G L, ^^'^^^^^ \\in{l) - x{l)\\ - 1]^ <9,^>l-9. (15) 

Combining equation (jl4p . (jlSp and using Assumption [H for every > 0, 
there exists G N such that for every strategy r of Player 2: 

Pa,r (^Vn > A^, V/ G L, (^7^„(/) - < 0, ^ >l-9, (16) 

and o" is (L, e)-internally consistent. □ 



Remark 2.8 Lugosi, Mannor and Stoltz fl9f provided an algorithm that 
constructs, by block of size m G N, a strategy that has no external regret. We 
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can describe it as follows. Play at every stage of the k-th block according 
to the same probability & ^{I)- Then compute (using Lemma \2. 7p an 
estimator of the average flag on this bloc and denote it by Jlk- Knowing 
this flag, compute the average regret accumulated on this specific block and 
aggregate it to the previous regret in order to estimate the average regret from 
the beginning of the game. Decide next what action is going to be played 
on the following block according to a classical exponential weight algorithm. 
With a fine tuning 0/ m S N ( and r] > 0), the external regret of this strategy 
converges to zero at the rate O (n^^/^) (the optimal rate is known to be 

Instead of trying to compute (or at least approximate) the sequence of 
payoffs from the sequence of signals, our algorithm consider an abstract aux- 
iliary game defined on the signal space (i.e. on the relevant information, the 
observations). We define payoffs in this abstract game in order to transform 
it into a game with full monitoring: the action set of Player 2 are flags, that 
are (almost) observed by Player 1. 

The strategy constructed is based on 5-calibration and Hoeffding-Azuma's 
inequality, therefore one can show that: 



E, 



(j,r 



sup 7^n(^) 

l&L n 



So given e > 0, one can construct a strategy such that the internal regret 
converges quickly to e, but maybe very slowly to zero (because the constants 
depend, once again, drastically on s"^ ). 



Remark 2.9 Since Sn converges to Ji^, the regret can be defined in terms 
of observed empirical flags instead of unobserved average flag. For the same 
reason, x{l) can be used to define regret. 



2.3 On the strategy space 

One might object that behavioral strategies of Players 1 are defined as map- 
pings from the set of past histories = UneN (-^ ^ '^)" ^(-^) while 
in Definition 12.31 (and Theorem I2.6P strategies considered are defined as 
mappings from IJneN {I ^ S x L)" into A(L), with the specification that 
given In G L, the law of i„ is x(/„) — for a fixed family {x{l),l G L}. Hence, 
they can be defined as mappings from IJneN (-^ x x S)^ into A(X) (where 
X = A(/) and A(X) is embedded with the star- weak topology) and thus 
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are behavioral strategies in the game where Player I's action set is X and 
he receives at each stage a signal in / x S". 

Therefore, they are equivalent to (i.e., following Mertens Sorin and Za- 
mir [21j . Theorem 1.8 p. 55, generate the same probability on the set 
of plays as) mixed strategies, which are mixtures of pure strategies, i.e. 
mappings from IJneN (-'^ x x -S*)" into X. These latter are equivalent 
to applications from IJnGN (-^ ^ i'^^o Indeed, consider for example 
a : UnsN (-^ ^ -^)" ~^ ^"-^ define a : UngN ^ X recursively by 
a(0) = ct(0) and 

a (tl, . . . , t„) = CJ (a(0), to, . . . , CT(to, • • • , tn-l), tn) . 

Finally, they are, in the game where Player I's action set is / and he 
receives at each stage a signal in 5, mixtures of behavioral strategies — also 
called general strategies — so are equivalent to behavioral strategies. 

In conclusion, given a strategy defined as in Definition 12.31 there exists a 
behaviorial strategy that generates the same probability on the set of plays 
(for every strategy r of Player 2) . 

In these general strategies. Player 1 uses two types of signals: the signals 
generated by the game, i.e. the sequence {in-, Sn)neN and some private signals 
generated by his own strategy, i.e. the sequences of In- We can compute 
internal regret in Theorem 12.61 not only because the choices of /i^ and In are 
independent given the past, but mainly because the choices of /x„ and i„ are 
independent, even when In is known. 

3 Back on payoff space 

In the section we give simple condition on G that ensures it fulfills Assump- 
tion [TJ We also extend the framework to the so-called compact case- Finally, 
we prove that an internally consistent strategy (in a sense to be specified) 
is also externally consistent. 

3.1 The worst case fulfills Assumption [1] 

Proposition 3.1 Let G : A(I) x A(S')-^ he such that for every fi G A{S)^ , 
G{-,n) is continuous and the family {G{x,-),x G A(/)} is equicontinuous. 
Then G fulfills AssumptionUl 

Proof: Since {G{x,-),x G A(/)} is equicontinuous and A{S)^ compact, 
for every e > 0, there exists 6 > such that: 

Vx G A(/),V^,^' G A{SY,y-fi'\\ < 2(5 ^ \G{x,fi) -G{x,n')\ < |. 
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Let G L} be a finite 5-grid of ^{S)^ and for every / G L, x{l) G 

BR{ii{l)) so that G{x{l), fj,{l)) = max2gA{7) ^(2;, Since is 
continuous, there exists //(/) > such that: 

Ik </?(/) MO)- G(x(/),MO)l <e/2. 



Define 77 = min^gx, r]{l) and let x G A(/), ^ G A(S')^ and / G L such that 
\\x — x{l)\\ < T] and — < then: 

G(x, fi) > G(x, fi(l)) --> G(x(l),fi(l)) -£= max Giz, fi(l)) - e, 

2 zgA{/) 

and X G BRe{p). □ 

This proposition implies that the evaluation function used by Rustichini 
fulfills Assumption [1] (see also Lugosi, Mannor and Stoltz |19j . Lemma 3.1 
and Proposition A.l). Before proving that, we introduce S, the range of s, 
which is a closed convex subset of A{S)^ , and n5(-) the projection onto it. 

Corollary 3.2 Define W : A(/) x A(5)^ ^ R by: 

yy{■^,^^) <y W{x,Us{fi)) otherwise. 
Then W fulfills Assumptionl^ 

Proof: We extend s linearly to M"^ by s{y) = J^j^j where y = 

{y{j))j£j- Therefore (Aubin and Frankowska [1], Theorem 2.2.1, p. 57) 
the multivalued application : S ^ A(J)^ is A-Lipschitz, and since Hg 
is 1-Lipschitz (because 5 is convex), W{x,-) is also A-Lipschitz, for every 
X G A(/). Therefore, {G(x,-),x G A(/)} is equicontinuous. For every 
^ G A{S)^ , W{-,fi) is r-Lipschitz (where r = \\p\\, see e.g. Lugosi, Mannor 
and Stoltz [19]), therefore continuous. Hence, by Proposition 13. 11 W fulfills 
Assumption [H □ 



3.2 Compact case 

Assumption [1] does not require that Player 1 faces only one opponent, nor 
that his opponents have only a finite set of actions. As long as G is regular 
then Player 1 has a (L, e)-internally consistent strategy, for every £ > 0. We 
consider in this section a particular framework, referred as the compact case 
(as mentioned in section [1]). 
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Player I's action set is still denoted by /, but we now assume that the 
action set of Player 2 is [—1,1]^. The payoff mapping p from A(/) x [—1,1]^ 
to M is simply defined by p{x, U) = {x, U) . Let s be a multivalued application 
from [—1,1]^ to A[S)^ . Given the choices of i and U, Player 1 does not 
observe U but receives a signal s £ S, whose law is the i-th component of p 
which belongs to s(J7). If s(C/) is not a singleton then we can assume either 
that fi is chosen by Nature (a third player) or by Player 2. 

A multivalued application s is closed-convex if As(x) + (1 — A)s(z) C 
s(Ax + (1 — X)z) and its graph is closed and its inverse is defined by s^^(/i) = 
{U G [— G s(U)}. It is clear that if s is closed-convex then s""*^ is 
also closed-convex. 

Proposition 3.3 Define the worst case mapping as in Corollarv \3.S\ Ifsis 
closed-convex and its range is a polytope (the convex hull of a finite number 
of points), then W fulfills Assumption{l\ 

Proof: We follow Aubin et Frankowska [I]: let po be in S the range of s, 
Uq be in s^^(^o) and g be the mapping defined by: 

g -.S ^ M. 

p -)■ g{p) = inf [/oil = (i([/o,s"^(/i)) . 

Since s is convex, so is (in the multivalued sense) and g (in the univalued 
sense). The sections {p\g{p) < A} are closed (see Aubin and Frankowska [l], 
Lemma 2.2.3 p. 59) so g is lower semi-continuous. Since the domain of g is 
a polytope, g is also upper semi-continuous (see Rockafellar |24j . Theorem 
10.2 p. 84). Therefore g is continuous over S and there exists S{Uq) such 
that if \\p — pqW < S{Uq) then d [Uo,s^^{p)) < e. 

Since s~^{po) is compact, for every s > 0, there exists a finite set U such 
that s~^{pq) C Uc/eW B{U,e). Define 5{po) = infu^u 6{Uo), then for every p 
in A(S')-'^, II// — ^oll < <^(/^o) implies that s^^(^o) C s~^{p) + 2£B (with i? the 
unit ball). The graph of is compact so for every e > there exists < 
6'{po) < 6{pq) such that if \\p — po\\ < S'ifJ'o) then s~^{p) C s~^{po) + 2eB. 

There exists a finite set M such that the compact set S is included in the 
union of open balls U/^ga/ -^(/^' '^'(/^)/3)- If we denote by 5 = inf^gM '^'(^)/3 
then for every p and p' in 5, if \\p — p'\\ < 5, there exists pi G M such 
that p and p' belongs to B{pi, 6' (pi)) hence s^^{p) C s^^(^i) + 2eB C 
s-^{p')+4£B. 
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Let fj, and fi' in A{S)^ such that — fi'\\ < 5. Then since 5 is a convex 
set ||n5(^) - n5(;u')|| < 6 and for every x G A(/) 




Let X and x' in A(/) such that \\x — x'\\ < e then for ah /u S A(S')^ 




{x,U) > 




Hence if x{l) is a e-best response to \\x — x{l)\\ < e and \\fi — ^(/)|| < 5 
then 



W{x,n)>W{x{l),n)-e>W{x{l),fi{l))-5e > sup ^(/)) - 6e 



Remark 3.4 (On the assumptions over s) s is assumed to be multival- 
ued since in the finite case, there might be two different probabilities y and 
y' in A(J) that generate the same outcome vector p{y) = {p{i,y))i^j = p{y') 
but two different flags s{y) and s(y'). 

It is also convex: if Player 2 can generate a flag [i by playing y G A( J) 
and a flag p' by playing y' , then a convex combination of y and y' should 
generate the same convex combination of flags. This assumption is specif- 
ically needed with repeated game: for example, Player 2 can play y on odd 
stages and y' on even stages. Player 1 must know that the average empirical 
flag can be generated by l/2y + l/2y' . 

The fact that the range of s is a polytope ( or at least that it is locally 
simplicial, see Rockafellar ^241 p. 84 for formal definitions and examples) is 
needed for the proof that W is continuous. It is obviously true in the finite 
dimension case since its graph is a polytope. 

3.3 Regret in terms of actual payoffs 

As Rustichini [25j , we can define regret in term of unobserved average payoff. 

Definition 3.5 A strategy a of Player 1 is {L,e) -internally consistent with 
respect to the actual payoffs if for every I £ L: 



zeAii) 

> sup W{z, /i) - We, 



so X is a lOe-best response to p. 



□ 



hm sup 

n— >+oo 



Nn(l)\ 



n 



( 



x€AiI) 



sup [W{x, Jl^il)) - - e < 0, F^,r-as. 
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Proposition 3.6 For every e > 0, there exist {L,e) -internally consistent 
strategies with respect to the actual payoffs. 

Proof: Consider the strategy a given by Tlieorem 12.61 witii tiie worst case 
mapping. By definition of W and using the independence of the choices of 
x{l) and fin, one can easily show that asymptotically W < 
Therefore the strategy a is also (L, e)-consistent with respect to the actual 
payoffs. □ 
Now we can define 0-internally consistent strategies (see Lehrer and 
Solan [m definition 10): 

Definition 3.7 A strategy a of Player 1 is 0-internally consistent if for 
every e > 0, there exists 6 > such that for every finite partition {P{l),l G 
L} of A{I) with diameter smaller than 5 and every I € L: 



where Nn{l) = {m <n,XnG -^(01 with x„ the law (that might be chosen at 
random by Player 1) of in given the past history and 'p-n{^) (f^sp. in{l)) is 
the average flag (resp. action of Player 1) on Nn{l)- 

Proposition 3.8 There exist 0-internally consistent strategies with respect 
to the actual payoffs. 

Proof: The proof relies uniquely on a classical doubling trick (see e.g. 
Sorin [29j, Proposition 3.2 p. 56) recalled below. 

Denote by the strategy given by Proposition 13.61 for = 2^^''^^\ 
Consider the strategy a of player defined by block: on the first block of length 
A^i , Player 1 plays accordingly to cJi , then on the second block of length 
accordingly to (72, and so on. Formally, for n such that X]fc=i -^fc ^ n < 
Y^k=i^k, cr{hn) = ap{hD where h^n = (^m' ^m, Sm)^g{j-p-i ;Vfc,....n} 
partial history on the last block. Remark 12.81 implies that for every p G N 
there exists Mp G N such that 



definition, the m-th block is way longer than all the previous blocks, and 





Let {Nk)ke'N be a sequence such that Ylp=i ~ ^i^k) and M^+i = o{Nk) 
(where n„ = o{vn) means that Vn > and lim„^oo = 0). With this 
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longer than the time required by ak+i to be Cfc+i-consistent (in expectation). 
So the (maybe high) regret accumulated during the first M„ stages of the 
n-th block is negligible compared to the small regret accumulated before 
(during the first (n — l)-blocks). After these M„ stages, the regret (on the 
n-th block) is smaller than e„ and at the end of this block, the cumulative 
regret is very close to e. □ 

Remark 3.9 The use of a doubling trick prevents us to easily find a bound 
on the rate of convergence of the regret. The proof of Proposition 1 3. 8\ requires 
that the sum of the regret on two different block is smaller than the average 
regret. This is why we restrict this definition to internally consistent strate- 
gies with respect to the actual payoffs. One may compare Definition \3. 7| of 
0-consistency to the Definition \l.S\ of e- calibrated strategies. 



3.4 External and internal consistency 

With full monitoring, by linearity of the payoff function, a strategy that is 
internally consistent is also externally consistent. This properties holds in 
partial monitoring, when we consider regret in terms of actual payoffs: 

Proposition 3.10 For every e > and {x(/),/ G L} of A{I), every {L,e)- 
internally consistent strategy with respect to the actual payoffs is e-externally 
consistent with respect to the actual payoffs, i.e. Po-,r-ps-' 

limsup max W{x,Jij^) —'Pn<£- 

n^+oo x€A{I) 

Proof: Let e > 0, L C A(/) and a be an (L, e)-internally consistent 
strategy with respect to the actual payoffs. Since s~^(-) is convex then, for 
every x G A(/), the mapping fi ^ W{x, fi) is convex and so is the mapping 
^ I—)- max^.g^(/) W{x, fi). Hence 

max W{x,JlJ - p„ < V ( max VF(x, /!„(/)) - p^{l)) . 

:GA(/) f-^ n \x€A{I) J 



Therefore, one has 



limsup max Ty(j;,/i„) — /0„ < limsup — '^^ e < e 

n— >-oo a;gA(/) >-+oo n 

and so a is e-externally consistent. □ 

Proposition 13.101 holds for the compact case under the assumption that 
a is closed-convex. Note that the proof relies on the fact that W is convex 
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and the actual payoffs are linear. It is clear that this result does not extend 
to any evaluation function. Indeed, consider the optimistic function defined 
by (for fi £ S): 

0(x,/u)= sup p{x,y), 
j/es-i(^t) 

then the more information about that Player 1 gets, the less he evaluates 
his payoff. So an internally consistent strategy (i.e. a strategy that is con- 
sistent with a more precise knowledge on the moves of Player 2) might not 
be externally consistent. 

Concluding remarks 

In the full monitoring framework, many improvements have been made in the 
past years about calibration and regret (see for instance jl6l 126^ [30]). Here, 
we aimed to clarify the links between the original notions of approachability, 
internal regret and calibration in order to extend applications (in particular, 
to get rid of the finiteness of J), to define the internal regret with signals 
as calibration over an appropriate space and to give a proof derived from 
no-internal regret in full monitoring, itself derived from the approachability 
of an orthant in this space. 
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